Stochastic Models for Surface Information Extraction in Texts

نویسندگان

  • Massih - Reza Amini
  • Hugo Zaragoza
  • Patrick Gallinari
چکیده

We describe in this paper the application of numerical machine learning techniques to the extraction of information from a collection of textual data. More precisely, we consider the modeling of text sequences with Hidden Markov Models (HMMs) and Multi-layer Perceptrons (MLPs) and show how these models can be used to perform specific surface extraction tasks (i.e. tasks which do not need in depth syntactic or semantic analysis). We consider different text representations using semantic and syntactic knowledge and analyze the influence of different grammatical constraints on the models using the MUC-6 corpus.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Learning for Sequence Extraction Tasks

We consider the application of machine learning techniques for sequence modeling to Information Retrieval (IR) and surface Information Extraction (IE) tasks. We introduce a generic sequence model and show how it can be used for dealing with different closed-query tasks. Taking into account the sequential nature of texts allows for a finer analysis than what is usually done in IR with static tex...

متن کامل

Application of Stochastic Optimal Control, Game Theory and Information Fusion for Cyber Defense Modelling

The present paper addresses an effective cyber defense model by applying information fusion based game theoretical approaches‎. ‎In the present paper, we are trying to improve previous models by applying stochastic optimal control and robust optimization techniques‎. ‎Jump processes are applied to model different and complex situations in cyber games‎. ‎Applying jump processes we propose some m...

متن کامل

Presenting a method for extracting structured domain-dependent information from Farsi Web pages

Extracting structured information about entities from web texts is an important task in web mining, natural language processing, and information extraction. Information extraction is useful in many applications including search engines, question-answering systems, recommender systems, machine translation, etc. An information extraction system aims to identify the entities from the text and extr...

متن کامل

Development of a regional attenuation relationship for Alborz, Iran

New attenuation relationships for rock and soil in Alborz, have been developed in this study. When the quantity of usable ground-motion data is inadequate in the magnitude and distance ranges, development of an empirical prediction equation is deficient. Due to lack of data, the two well-known simulation techniques, point source and finite-fault models have been used to generate more than ten t...

متن کامل

Coupled Hierarchical IR and Stochastic Models for Surface Information Extraction

We present in this paper a combination of Machine Learning based Information Retrieval (IR) techniques and stochastic language modelling in a hierarchical system that extracts surface information from text. At the lowest level of this hierarchy, documents and paragraphs are successively routed with IR techniques. At the top level, a stochastic language model extracts the most relevant phrases, ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1999